Lecture 2: Introducing Python

CSCI 1360E: Foundations for Informatics and Analytics

Overview and Objectives

In this lecture, I'll introduce the Python programming language and how to interact with it; aka, the proverbial Hello, World! lecture. By the end, you should be able to:

  • Recall basic history and facts about Python (relevance in scientific computing, comparison to other languages)
  • Print arbitrary strings in a Python environment
  • Create and execute basic arithmetic operations
  • Understand and be able to use variable assignment and update

While this lecture notebook is non-interactive, it is available for you (as are all the other lectures released in this course) to download and execute as an interactive notebook. Instructions for how to do this are included in the Appendix at the end. I encourage you to give it a try!


Part 1: Background

Python's history

Python as a language was implemented from the start by Guido van Rossum [1]. What was originally something of a snarkily-named hobby project to pass the holidays turned into a huge open source phenomenon used by millions. The original project began in 1989, with the release of Python 2.0 in 2000 and Python 3.0 in 2008. As of this lecture, the latest stable releases of these branches are 2.7.11--which Guido emphatically insists is the final, final, final release of the 2.x branch--and 3.5.1 (which is what we're using in this course).

If you're (rightly!) wondering why a 2.x branch has survived a decade and a half after its initial release, there is indeed a story there. In short, Python 3 was designed as backwards-incompatible; a good number of syntax changes and other internal improvements made the majority of code written for Python 2 unusable in Python 3. We'll go over one of these examples, but suffice to say this made it difficult for power users and developers to upgrade, particularly when they relied on so many third-party libraries for much of the heavy-lifting in Python; until these third-party libraries were themselves converted to Python 3, most developers stuck with Python 2.

As a language

Python is an interpreted language. Those of you who have some prior programming experience may recognize this term; it means the code you write is directly translated and "interpreted" by the computer into 1s and 0s that are executed on the CPU. Often in contrast, compiled languages, such as C, are first compiled into binary executables of machine code that are then run by the computer.

In practice, this distinction has become blurry, particularly in the past decade. Interpreted languages in general are easier to use but run more slowly and consume more resources; compiled languages in general are more difficult to program in, but run much more efficiently. As a result of these advantages and disadvantages, modern programming languages have attempted to combine the best of both worlds:

  • Java is initially compiled into bytecode, which is then run through the Java Virtual Machine which acts as an interpreter. In this sense, it is both a compiled language and an interpreted language.
  • Python runs on a reference implementation, CPython, in which chunks of Python code are compiled into intermediate representations and executed.
  • Julia, a relative newcomer in programming languages designed for scientific computing and data science, straddles a middle ground in a different way: using a "just-in-time" (JIT) compilation scheme, whereby code is compiled as the program runs, theoretically providing the performance of compiled programs with the ease of use of interpreted programs. JIT compilers have proliferated for other languages as well, including Python.

An important point to keep in mind about Python: it is not designed as a specialized language for performing a specific task. It does not have numerical and statistical tools embedded into the core language itself; instead, the Python community relies on third-party developers to provide these extras (XKCD, of course, played off this facet of Python). Instead, as Jake VanderPlas put it [2]:

"Python syntax is the glue that holds your data science code together. As many scientists and statisticians have found, Python excels in that role because it is powerful, intuitive, quick to write, fun to use, and above all extremely useful in day-to-day data science tasks."

Zen of Python

One of the biggest reasons for Python's popularity, and for my decision in using it in this course, is its overall simplicity and ease of use, especially for individuals with little to no prior programming experience. This is no coincidence: Python was designed explicitly with this in mind. It's so central to the Python ethos, in fact, that it's baked into every Python installation. Tim Peters wrote a "poem" of sorts, The Zen of Python, that anyone with Python installed can read. To see it, just type one line of Python code:


In [1]:
import this


The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Lack of any discernible meter or rhyming scheme aside, it nonetheless encapsulates the spirit of the Python language. These two lines are particular favorites of mine:

If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.

The first line is straightforward enough: if you're having a hard time explaining your own code, you'd do well to go back and simplify it. If you didn't write the code and can't figure out what's going on, the person who wrote it definitely needs to go back and simplify it. Code golf may very well be how you relax and unwind after work, and that's perfectly fine. But during business hours, and especially if there's a nonzero chance of someone else having to use your code in the future, for their sake, keep it simple!

The second line is a little trickier, with that word "may" royally gumming up what was otherwise an eloquent antithesis. Of course, this is no accident: "easy to explain" is necessary for a good implementation, but it is not sufficient. Hence all the caveats in the full poem disguised as sagacious syllogisms.

Don't you just feel so zen right now?

Part 2: Hello, World!

Enough reading, time for some coding, amirite?

So what does it take to write your first program, your "Hello, World!"? Pound-include iostream dot h? Import java dot io? Define a main function with command-line parameters? Wrap the whole thing in a class?


In [2]:
print("Hello, world!")


Hello, world!

...yep! That's all there is to it.

Just for the sake of being thorough, though, let's go through this command in painstaking detail.

Functions

print() is a function. It's a construct that takes some input, performs an operation on it, and gives you back some output. You can think of it as a direct analog of the mathematical term, $f(x) = y$. In this case, $f()$ is the function; $x$ is the input, and $y$ is the output. Think of print() the same way--it's a function that Python provides, where the input is some text you want printed, and the output is (surprise!) the text you wanted printed.

Later in the course, we'll see how to create our own functions, but for now we'll make use of the ones Python provides us by default.

Arguments

"Arguments" is the formal term for what I previously referred to as "input". The parameters, or arguments, to your function are the inputs that it uses to perform whatever operation it is designed to perform. In this case, there is only one argument to print(): a string of text that we want printed out. This text, in Python parlance, is called a "string". I can only presume it is so named because it is a string of individual characters.

We can very easily change the string argument we pass to print():


In [3]:
print("This is not the same argument as before.")


This is not the same argument as before.

We could also print out an empty string, or even no string at all.


In [4]:
print("")  # this is an empty string




In [5]:
print()  # this is just nothing



In both cases, the output looks pretty much the same...because it is: just a blank line. As part of what print() does, after it finishes printing your input string, it prints one final character--a newline. This is basically the programmatic equivalent of hitting Enter at the end of a line, moving the cursor down to the start of the next line.

Please feel free to fire up this notebook in interactive mode and change the arguments to see what you get! Instructions for doing so are in the Appendix below.

(and, if you're curious about those # signs in the above code: those delineate comments. In programming parlance, "comments" are lines in your program that the language ignores entirely. When you type a # in Python, everything after that symbol on the same line is ignored. They're there purely for the developers as a way to put documentation and clarifying statements directly into the code. It's a practice I strongly encourage everyone to do--even just to remind yourself what you were thinking! I can't count the number of times I worked on code, set it aside for a month, then came back to it and had absolutely no idea what I was doing)

Part 3: Beyond "Hello, World!"

Ok, so Python can print strings. That's cool. Can it do anything that's actually useful?

Why, yes, as a matter of fact! Thanks for asking!

Python has a lot of built-in objects and data structures that are very useful for more advanced operations--and we'll get to them soon enough!--but for now, you can use Python to perform basic arithmetic operations. Addition, subtraction, multiplication, division--they're all there. You can use it as a glorified calculator:


In [6]:
3 + 4


Out[6]:
7

In [7]:
3 - 4


Out[7]:
-1

In [8]:
3 * 4


Out[8]:
12

In [9]:
3 / 4


Out[9]:
0.75

Python respects order of operations, too, performing them as you'd expect:


In [10]:
3 + 4 * 6 / 2 - 5


Out[10]:
10.0

In [11]:
(3 + 4) * 6 / (2 - 5)


Out[11]:
-14.0

Python even has a really cool exponent operator, denoted by using two stars right next to each other:


In [12]:
2 ** 3  # 2 raised to the 3rd power


Out[12]:
8

In [13]:
3 ** 2  # 3 squared


Out[13]:
9

In [14]:
25 ** (1 / 2)  # Square root of 25


Out[14]:
5.0

Now for something really neat:


In [15]:
x = 2
x * 3


Out[15]:
6

In this example, I've used a core Python construct (and really, a construct in any language) called a variable. This is where the parallels between math and computer science become much more tangible: I can create variables that, when the program runs, take on arbitrary values and perform computations on them.

In Python, anything that isn't a "reserved word"--things like print, other functions I've defined, or other words that are important to Python--is treated as a variable, and which you can therefore use to store and manipulate values. Here's an operation that involves two variables:


In [16]:
x = 2
y = 3
x * y


Out[16]:
6

We can assign the result of operations with variables to other variables:


In [17]:
x = 2
y = 3
z = x * y
print(z)


6

The use of the equals sign = is called the assignment operator. It's in the same category as the addition, subtraction, multiplication, and division operators, and works about as you'd expect: it takes whatever value is being computed on the right-hand side of the equation and assigns it to the variable on the left-hand side.

Some of you may be thinking: what happens if I perform an assignment on something that can't be assigned a different value...such as, say, a number?


In [18]:
x = 2
y = 3
5 = x * y


  File "<ipython-input-18-fa3d1a7296cd>", line 3
    5 = x * y
             ^
SyntaxError: can't assign to literal

CRASH!

Ok, not really; Python technically did what it was supposed to do. It threw an error, alerting you that something in your program didn't work for some reason. In this case, the error message is can't assign to literal.

That can be a little difficult to understand, though the SyntaxError statement beforehand gives us a bit of a clue: we did something wrong that involves Python's syntax, or the structure of its language. In this case, the "literal" being referred to is the number 5. We are attempting to assign the result of the computation of x * y to the number 5. However, 5 is known internally to Python as a "literal." This is pretty much what it sounds like--it has a literal value that cannot be changed through assignment. When we try, Python complains and crashes its program with an error message.

So we can't assign values to numbers. What about assigning values to a variable that's used in the very same calculation?


In [19]:
x = 2
y = 3
x = x * y
print(x)


6

This works just fine! In fact, it's more than fine--this is such a standard operation, it has its own operator:


In [20]:
x = 2
y = 3
x *= y
print(x)


6

If we say this out loud, it's pretty much what it looks like: "x times equals y". This is shorthand for the previous version, where we multiplied x by y and stored the value in x, effectively updating it. There are many instances where you'll want to increment a variable: for example, when counting how many of some "thing" you have.

All the other operators have the same shorthand-update versions: += for addition, -= for subtraction, and /= for division.

Review Questions

That was a bit of a whirlwind! Some questions to consider to help crystallize things:

1: Let's say you want to count the number of words in Wikipedia. You have a variable to track this count: word_count = 0. For every word you come across, you'll update this counter by 1. Using the shorthand you saw before, what would the command be to update the variable at each word?

2: What would happen if I ran this command? Explain. ("5" + 5)

3: In this lecture, we used what is essentially a Python shell in order to execute Python commands. Let's say, instead, we wanted to run a sequence of commands in a script. I've put a couple commands in the file commands.py. How would you execute this script from the command prompt?

4: What would happen if I ran this command? Explain. x = y


Course Administrivia

Appendix: Additional Resources

  1. Guido's PyCon 2016 talk on the future of Python: https://www.youtube.com/watch?v=YgtL4S7Hrwo
  2. VanderPlas, Jake. Python Data Science Handbook. 2016 (pre-release).

Making this notebook interactive

All notebook lectures will be posted to the course GitHub repository here: https://github.com/quinngroup/csci1360e-su16

In order to execute the notebooks and make them interactive (i.e., you can change the code in them and re-execute it to see the results), you have two options:

  • Do it yourself: If you've already completed A0 and opted to install the Anaconda Python environment, you can host these notebooks yourself.

    1. Download the contents of the above repository. You can do this by clicking the green "Clone or download" button and selecting "Download ZIP". Extract the archive somewhere on your computer.
    2. Open up a command prompt / Terminal window. Within the shell, navigate to the folder you just extracted.
    3. Run the command: jupyter notebook and press Enter. This will initialize the Jupyter notebook environment, and should immediately spawn a web server you can view through your browser (http://localhost:8888). Provided you're in the correct directory, you should see a file with the extension .ipynb (for "IPython Notebook, where "IPython" means "Interactive Python"). Click on that, and Jupyter should immediately render the notebook in interactive mode!
    4. When you're finished, just go back to your command prompt and press CTRL+C to kill the Jupyter process.
  • Use mybinder: A wonderful tool out of HHMI Janelia Farms, mybinder will host GitHub repositories with notebooks in them through their own servers. Since the GitHub repository is already set up, all you have to do to launch the notebooks is:

    1. Go to the csci1360e-su16 repository linked above in your web browser.
    2. Click the "launch binder" badge you see on the front page. It may take some time to spin up, but it will eventually provide you an interactive interface to the notebooks.

To interact with the notebooks, double-click on the section you want to edit. Once you've completed your edits and way to re-evaluate the cell, click the button the toolbar at the top that looks like a "play" button (just to the left of the "stop" button). The cell will execute, and any output will be displayed right below the cell.

If, instead of editing existing cells, you want to add new ones: click the "Insert" drop-down menu and select where you'd like to add a new cell. You can then select what type of cell you want with the selectable drop-down; if you're writing Python, you'll want to select the Code option.